17 research outputs found

    A test of independence in two-way contingency tables based on maximal correlation

    Get PDF
    Maximal correlation has several desirable properties as a measure of dependence, including the fact that it vanishes if and only if the variables are independent. Except for a few special cases, it is hard to evaluate maximal correlation explicitly. We focus on two-dimensional contingency tables and discuss a procedure for estimating maximal correlation, which we use for constructing a test of independence. We compare the maximal correlation test with other tests of independence by Monte Carlo simulations. When the underlying continuous variables are dependent but uncorrelated, we point out some cases for which the new test is more powerful. © Taylor & Francis Group, LLC

    Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms

    Full text link
    In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance and Williams' formula which enables the implementation of the algorithm in a recursive way.Comment: Free Software for Agglomerative Hierarchical Clustering using Multidendrograms available at http://deim.urv.cat/~sgomez/multidendrograms.ph

    Evaluating observed versus predicted forest biomass: R-squared, index of agreement or maximal information coefficient?

    Get PDF
    The accurate prediction of forest above-ground biomass is nowadays key to implementing climate change mitigation policies, such as reducing emissions from deforestation and forest degradation. In this context, the coefficient of determination (R2{R^2}) is widely used as a means of evaluating the proportion of variance in the dependent variable explained by a model. However, the validity of R2{R^2} for comparing observed versus predicted values has been challenged in the presence of bias, for instance in remote sensing predictions of forest biomass. We tested suitable alternatives, e.g. the index of agreement (dd) and the maximal information coefficient (MICMIC). Our results show that dd renders systematically higher values than R2{R^2}, and may easily lead to regarding as reliable models which included an unrealistic amount of predictors. Results seemed better for MICMIC, although MICMIC favoured local clustering of predictions, whether or not they corresponded to the observations. Moreover, R2{R^2} was more sensitive to the use of cross-validation than dd or MICMIC, and more robust against overfitted models. Therefore, we discourage the use of statistical measures alternative to R2{R^2} for evaluating model predictions versus observed values, at least in the context of assessing the reliability of modelled biomass predictions using remote sensing. For those who consider dd to be conceptually superior to R2{R^2}, we suggest using its square d2{d^2}, in order to be more analogous to R2{R^2} and hence facilitate comparison across studies

    Distribution of antioxidant components in roots of different red beets (Beta vulgaris L.) cultivars

    Get PDF
    The beetroot is typically on the table in winter in form of pickles or juice, but for its nutritional values it would deserve more common consumption. Its curative effect in great part is due to the several vitamins, minerals, and compounds with antioxidant activity. But the division of biological active compounds is very different in the parts of the root. Based on our results, we could compare the differences between the morphology and some inner contents (soluble solid content, colour, betacyanin, betaxanthin, and polyphenol contents, antioxidant activity, and some flavonoids) of two beetroot cultivars. The results of the morphological investigations showed that the ‘Cylindre’ cultivar had more favourable crop parameters than the ‘Alto F1’ cultivar. In the ‘Cylindre’ cultivar the polyphenol content and the antioxidant capacity were significantly higher than in the ‘Alto F1’ cultivar. By determination of the betanin contents of the investigated beetroots, our results showed both betacyanin and betaxanthin contents were higher in the ‘Cylindre’ cultivar. The chlorogenic acid, gallic acid, the cumaric acid have been identified based on the peaks of HPLC in the studied beetroot cultivars

    Piecewise Approximate Bayesian Computation: fast inference for discretely observed Markov models using a factorised posterior distribution

    Get PDF
    Many modern statistical applications involve inference for complicated stochastic models for which the likelihood function is difficult or even impossible to calculate, and hence conventional likelihood-based inferential techniques cannot be used. In such settings, Bayesian inference can be performed using Approximate Bayesian Computation (ABC). However, in spite of many recent developments to ABC methodology, in many applications the computational cost of ABC necessitates the choice of summary statistics and tolerances that can potentially severely bias the estimate of the posterior. We propose a new “piecewise” ABC approach suitable for discretely observed Markov models that involves writing the posterior density of the parameters as a product of factors, each a function of only a subset of the data, and then using ABC within each factor. The approach has the advantage of side-stepping the need to choose a summary statistic and it enables a stringent tolerance to be set, making the posterior “less approximate”. We investigate two methods for estimating the posterior density based on ABC samples for each of the factors: the first is to use a Gaussian approximation for each factor, and the second is to use a kernel density estimate. Both methods have their merits. The Gaussian approximation is simple, fast, and probably adequate for many applications. On the other hand, using instead a kernel density estimate has the benefit of consistently estimating the true piecewise ABC posterior as the number of ABC samples tends to infinity. We illustrate the piecewise ABC approach with four examples; in each case, the approach offers fast and accurate inference
    corecore